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ABSTRACT 

Two- and three-categorv versions of Nedelsky's 
procedure for setting sinimum passing scores, based on item content, 
were compared. Graduate students acting as judges classified the 
response options on their midterm into two categories: (II those 
which should be rejected as incorrect by a minimally performing (B 
average) student; and (2) those which should not. Another group of 
classmates was also allowed to jategorize options as undecided. 
Comparisons of the resulting sets of passing scores were made on the 
basis of f1) the raw distributions of passing scores; (2) the 
consistency of pass-fail decisions between the two versions; and (3) 
the consistency of pass-fail decisions between each version and the 
passing score established by the test de aligner. The two versions 
produced essentially equivalent results. There was, in addition, a 
significant relationship between the passing score set by a judge and 
that judge's level of achievement on the midterm. (Author/CP) 
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ABSTRACT 

Two versions of the Nedelsky procedure for setting minimum 
passing scores are compared. Two groups of judges > one using each 
version^ set passing scores for a classroom test. Comparisons of 
the resulting sets of passing scores are made on the basis of (1) 
the raw distributions of passing scores, (2) the consistency of 
pass-^fail decisions between the two versions > and (3) the con- 
sistency of pass--fail decisions between each version and the pass- 
ing score established by the test designer* The two versions of 
the procedure are found to produce essentially equivalent results. 
In addition* a significant relationship is observed between the 
passing score set by a judge and that judge *s level of achievement 
in the content area of the test. 
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I. INTRODUCTION 

Passing scores are needed in a broad variety of situations, 
including (a) entranc^i examinations, (b) tests for advancement of 
students from unit to unit in individually prescribed instruc- 
tional programs, (c) minimum competency testing, and (d) certifi- 
cation or licensing examinations. Though writers such as Glass 
(1978) charge that passing scores for minimum competency testing 
are usually selected arbitrarily and frequently used unwisely, 
others (Hambleton, 1978; Shepard, 1976) have documented the need 
for cutoff scores in such areas as objectives-based programs and 
individualized instruction. This paper presumes the practical 
necessity of passing scores and explores ways in which they can 
be established more objectively. 

Procedures for Setting Passing Scores 

Various procedures for setting passing scores or "standards" 
have been developed (see Meskauskas, 1976). Most can be placed 
into one of three bread categories: (a) comparisons with the per- 
formance of others, (b) considerations of the consequences of 
misclaasification, and (c) examinations of item content. 
Standard-setting procedures in the first two categories generally 
require actual student response data or assume a theoretical, 
statistical distribution of such data; content-based methods use 
judgements of content experts. Content-based methods frequently 
are used with tests when student performance data are not avail- 
able. 

Methods for determining passing scores by analyzing test con- 
tent require a judge or group of judges to estimate the probable 
score of a hypothetical examinee responding at the level of mini- 
mum acceptable performance. Three of the beat-known content-based 
procedures are those proposed by Angoff (1971), Ebel (1972), and 
Nedelsky (1954). In using the Angoff method, each judge estimates, 
the probability that the "minimally acceptable person" would 
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respond correctly to each item; the passing score is determined by 
stunming the estimated item probabilities (Angoff , 1971; Zieky and 
Livingston, 1977). In the Ebel procedure, judges sort items into 
categories of "relevance" and "difficulty." Each judge then esti- 
mates the proportion of correct answers in each category expected 
of a "minimally qualified" examinee. The passing score is the 
weighted sum of these proportions, with the weight for each cate- 
gory being the number of items it contains (Ebel, 1972). The 
Nedelsky method is restricted to multiple-choice tests. Every re- 
sponse option is considered by each judge, who decides which op- 
tions could be rejected as incorrect by an examinee performing at 
the minimum passing level. The probability that someone at this 
level would respond correctly to the item is taken to be the re- 
ciprocal of the number of remaining options (i.e., one divided by 
the rnanber of options that the minimally performing examinee 
should not be able to reject). The passing score is the sum of 
these reciprocals for all items. (In the original formulation, 
Nedelslef (1954) offers further refinements, such as, estimating 
the standard deviation of the chance distribution of scores and 
using it in conjunction with setting the passing score. These 
refinements are not considered in this paper.) In all cases, the 
passing score can be expressed as a fraction or percentage of the 
total number of items. 

Comnartspns of the Application of the Methods 

The methods discussed above, though operationally quite 
different, have strong logical similarities. It might seem that 
they could be expected to produce equivalt^nt passing scores. Re- 
search reported in the literature indicates that this equivalence 
is not always observed. In a study comparing the Ebel and Nedelsky 
procedures, Andrew and Hecht (1976) found that the two standard- 
setting methods produced significantly different passing scores. 
Perhaps an even more important consideration was that 45 percent 
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of the examinees being tested were classified differently by the 
two passing scores (Glass, 1978). In research utilizing the 
Nedelsky and Angoff procedures, Brennan and Lockwood (1979) also 
reported a substantial difference in the resulting passing scores* 

When several judges are used, the variation among judges* 
individual passing scores also can become an issue. A certain 
degree of variation might be expected. It is usually suggested 
that the different passing scores be reconciled either by 
averaging the scores or by requiring judges to reach a con^sensus 
passing score. Andrew and Hecht (1976) found that passing scores 
obtained by consensus and by averaging did not differ significantly* 
In at least one reported case, however, the amount of variation 
among passing scores set by a group of judges using the Nedelsky 
procedure was substantial, and the procedure was rejected as un* 
feasible (Meskauskas and Webscer, 1975)* The averaging process 
treats the variation in passing scores as random or "error" varia- 
tion. It might be, however, that differences in passing scores 
are related systematically to characteristics of the judges. If 
passing scores are to be useful, they should not depend too much 
on the characteristics of a particular judge or group of judges* 
Such characteristics, once identified, possibly could be con-- 
trolled to prevent them from exerting an undue influence on the 
standard-setting process. One characteristic which intuitively 
might be expected to show such a relationship is the judge's own 
level of achievement in the relevant area. 

Focus of this Paper 

This paper deals only with the Nedelsky procedure. Two ver- 
sions of the procedure appear to be in use* In the firet version, 
judges must classify response options into two categories: (a) 
those which should be rejected as incorrect by the minimally per- 
forming examinee, and (b) those which should not* In the alter- 
native version, a third category, "undecided," also is used when 
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the judge is unable to classify the response option as one that 
either should or should not be rejected. Decisions between the 
two versions seem to be based on the preferences of the judges, 
rather than any theoretical consideration (e.g., Paiva and Vu, 
1979; Smllansky and Guerin, 1976). Nedelsky (1954) discussed the 
use of the alternative procedure; he apparently felt the two ver- 
sions were equivalent. 

The purpose of thi« paper is twofold. First, a comparison 
is made between the two /crsions of the Nedelsky procedure. 
Second, the relationship between the achievement levels of judges 
and the passing scores they set will be assessed. 

2. METHOD 

Subjects 

In order to compare the two versions of the Nedelsky pro- 
cedure, subjects acting as judges were divided into two groups. 
Group A used the two-category version of the procedure to set 
passing scores on an achievement test, while Group B used the 
three-category version. The results were compared using the dis- 
tributions of passing scores, as well as the consistency of 
decisions based upon the scores. Also, to determine the relation- 
ship between judges' achievement and passing score, the correlation 
between measures of the two variables was calculated. 

Data for the study were obtained from students in an intro- 
ductory course in educational research and measurement. The course 
was conducted via videotape at a number of regional campuses of a 
large state university. All subjects were graduate students; many 
were experienced teachers. 

InJtrument 

The instrument for which passing scores were set, and by 
which judges* achievement levels were determined, was the course 
midterm examination, a 40-item, four-option, multiple-choice test. 
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constructed by the course instructor (the second author). The 
test covered such topics as the nature of the research process, 
observation and measurement, sampling, and item analysis. The 
exam has been revised over several years to reach a high degree 
of content validity, and in its most recent administration showed 
an internal consistency (KR20) reliability index of .82. Thus, 
scores on the test are taken to be valid and reliable measures 
of achievement. 

Treatment Groups 

All students enrolled in the course wrote the midterm exam~ 
ination as a regular course requirement. The exams routinely were 
graded and returned to the students for discussion in class. The 
students then were asked to participate in an exercise involving 
the use of the Nedeisky procedure to determine a passing score for 
the test. While participation in the exercise was voluntary, more 
than 95% of the students chose to participate. Of the 148 students 
agreeing to participate, 30 were deleted from the study due to 
failure to follow instructions, missing identification codes, or 
missing achievement data, leaving 118 students as the sample used 
in the experiment. Subjects were assigned randomly to groups, 
stratified by course section to control for possible differences 
among regional campuses. Then they were given copies of the test, 
along with detailed instructions on the Nedelsky procedure. In- 
structions for the two ^i-oups differed only with respect to the 
version of the procedure used. 

Definition of Minimum Competence 

Minimum acceptable performance was defined for the subjects 
as the lowest level of performance on the test for which a grade 
of "B" would be awarded. This level was chosen as appropriate, 
since one of the requirements of the subjects' degree programs is 
that a "B" average be maintained. For each incorrect response 
option on the test, the subjects were instructed to respond to the 
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question "Should the student perfom-'ng at the minimum accep.able 
level (as definet* above) be able to reject this option as 
incorrect?" Spaces were provided for that purpose beside each 
option. For the two-category version (Group A) of the procedure, 
the possible responses were "yes" and "no." The three-category 
version (Group B) also allowed "undecided" as a possible choice. 
In order to minimize any possible confounding effect produced by 
the subjects^ knowledge of previously existing course standards, 
the subjects were not required to calculate their resulting 
Nedelsky passing scores; this was done by the authors. Each sub- 
ject responded individually; no attempt was made to determine con- 
sensus passing scores. 

Comparison Procedures 

The frequency distributions of passing scores produced by 
the two groups were compared using the Kolmogorov-Smimov two- 
sample test, a broad test sensitive to any difference in the two 
distributions. The distributions of passing scores are given in 
Table 1. All passing scores were rounded upward to the nearest 
whole number, that is, the number of correctly-answered items 
necessary for an examinee to be classified as passing. Decision 
consistency was assessed via comparisons of the proportions of 
students writing the exam who were classified similarly by the two 
versions. Both the mean and median passing scores for each group 
were used in the comparisons. The results are shown in Table 2. 
Also, decisions based on the groups' passing scores were compared 
with those based on the standard established by the course in- 
structor, as shown in Table 3. Finally, to assess the relation- 
ship between judges* achievement and passing score, the Pearson 
product-moment correlation coefficient was detenained for the 
subjects' examination grades and their Nedelsky passing scores. 
For this calculation, the two groups were combined. 
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TABLE 1 

Distributions of Passing Scores from Two Versions 
of the Nedelsky Procedure 







Frequency 




Frequency 


Score 




Group A 


Group B 




Group A 


Group B 


13 




0 


1 


26 


2 


4 


14 




1 


0 


27 


1 


0 


15 




u 


u 


28 


5 


2 


16 




2 


1 


29 


4 


4 


17 




0 


1 


30 


0 


1 


18 




1 


0 


31 


3 


5 


19 




0 


0 


32 


5 


' 3 


20 




3 


1 


33 






21 




1 


0 


24 


6 


10 


22 




1 


0 


35 


6 


5 


23 




2 


2 


36 


3 


2 


24 




2 


4 


37 


3 


5 


25 




1 


2 


33 


5 


3 




N 


MEAN 


MEDIAN 


S.D. 






Group A 


59 


29.88 


31.17 


6.38 






Group B 


59 


30.51 


31.37 


5.79 






Kolaogorov-Smirnov D 


- .170 (p - 


.36) 







3. RESULTS 

The overall passing score distributions for the two groups, 
displayed in Table 1, showed no significant difference (p - .36). 
As can be seen in Table 2, the two forms also produced highly 
consistent classification decisions. If the mean passing score 
for each group is used as a standard, only 7 of 185 students taking 
the teat would have been classified differently, a percentage of 
agreement of 96%. The exact median passing scores from the two 
groups are 31.17 md 31.37, respectively. Rounding upward, both 
these values become 32. Thus, use of tae median passing score 
produced the surprising result of complete agreement in classifi- 
cation. 
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The fact that the two versions produce passing scores yielding 
consistent decisions does not, in itself, mean that the scores are 
useful in practice. But further comparisons of decisions based on 
the Nedelsky passing scores with those based on standards previous- 
ly established by the course instructor (32 correct answers for a 
grade of B) also show a high degree of agreement (Table 3) . Using 
the group mean passing score as the standard, 11 of 185 students 
were classified differently by Group A (the two-category version) 
and the course instructor's pre-set standard (percentage agreement 
- 94%). For Group B (the three-category versions), thid percentage 
was 982; (7 students classified differently) . The group medians, 
rounded up to 32, coincide exactly with the course instructor's 
standard. Hera again, use of the group medians produced complete 
agreement. 

As was noted previously, subjects in both groups were com- 
bined to consider the relationship between judges' achievanent and 
passing score. Such a relationship, if it exists, might be expect- 
ed to hold across methods; in any event, the demonstrated equiva- 
lence of the two forms suggests the reasonableness of combining the 
two groups. The linear correlation between achievement and passing 
score for the subjects of the study was .30 (p ■ .001). Thus 
achievement in the subject matter area accounted for 9% of the ob- 
served variation in passing scores. 

4 . DISCUSSION 

From the results of this study, the two- and three-category 
versions of the Nedelsky procedure yield equivalent results. 
The finding holds both in terms of the empirical distributions of 
passing scores, and of consistency in classification decisions. 
Additionally, there was a close correspondence both in distribu- 
tions of passing scores and in classification decisions between 
passing scores set by the subjects and the pre-set standard es- 
tablished by the course instructor. 
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TABLE 2 

Decision Consistency of Passing Scores 
Two Versions of the Nedelsky Procedure 



Case I; Using the mean of several judges . 



Group B 





fail 


pass 




fail 




7 


51 


pass 


0 


134 


134 




44 


141 


185 



134 +44 

Proportion of consistent decisions - — » .96 



Case II; Using the median of several judges . 



Group B 





fail 


pass 




fail 


55 


0 


55 


pass 


0 


134 


134 




55 


134 


185 



Proportion of consistent decisions - • 1.00 
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While either the mean or median of several judges* passing scores 
could be used to set the final passing standards the median, rather 
than the mean» might be more appropriate. The median's resistance 
to the influence of extreme scores woxild seem to reduce some of the 
effect of variability in passing scores from a group of judges. 

Some variation was observed in the scores from both groups of 
judges. The slightly smaller standard deviation of passing scores 
from Group B, using the three-category version of the procedure, 
might be a point in favor of the use of that version. The signi- 
ficant positive correlation between judges' achievement and pass- 
ing score indicates that at least a small portion of the observed 
variation in passing scores was related systematically to a 
characteristic of the judges. Other relevant characteristics might 
be identified which also relate systematically to judges' passing 
scores. Knowledge of these characteristics and their relationship 
to passing scores could lead to their elimination, control, or 
utilization in the standard-setting process. This knowledge would 
make the setting of passing scores on the basis of expert judgement 
a more objective process. 

In conclusion, this study has shown that the two versions of 
the Nedelsky procedure considered here produce equivalent passing 
scores. Also, it was shown that the passing scores set by differ- 
ent judges were related positively to the judges' own achievement. 
It should be noted that the study involved the setting of passing 
scores for a single test, using as judges students who took the 
test but who were not responsible for constructing it. Further, 
such judges are not likely to have the broad knowledge of other 
students, of how such tested content fits into the total curri- 
culum, and of the subject-matter itself which, say, faculty 
members might have. It is an open question whether faculty 
members would tend to show the same pattern of consistency in 
applying the two Nedelsky methods. Thus the observed results must 
be seen as suggestive rather than conclusive. However, given the 
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TABLE 3 

Deciaioti Consistency of Course Instructor's Standard with 
Passing Scores from Two Versions of the Nedelsky Procedure 



Case I: Using the mean of several judges , 

Group A 
fail pass 



Instructor's fail 



Pre-set 
Standard 



pass 



44 


11 


0 


130 



55 



130 



44 141 185 

Proportions of consistent decisions 

130 + 44 , 94 
185 

Case II: Using the median of several judges > 

Group A 
fail pass 



Instructor's fail 



Pre-set 
Standard 



pass 



55 


0 


0 


130 



55 



130 



55 130 185 
Proportions of consistent decisions " 

130 + 55 



185 



1.00 . 



Group B 
fail pan; 



51 


4 


0 


130 


51 


134 


130 + 51 . 


185 


Group B 


fail 


pass 


55 


0 


0 


130 


55 


130 


130 + 55 . 



55 



130 



55 



185 
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results of this study, a choice between ttie two versions justifi- 
ably could be made on practical grounds, such as the preference of 
the judges. 
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